HPerf: A Lightweight Profiler for Task Distribution on CPU+GPU Platforms
نویسندگان
چکیده
Heterogeneous computing has emerged as one of the major computing platforms in many domains. Although there have been several proposals to aid programming for heterogeneous computing platforms, optimizing applications on heterogeneous computing platforms is not an easy task. Identifying which parallel regions (or tasks) should run on GPUs or CPUs is one of the critical decisions to improve performance. In this paper, we propose a profiler, HPerf, to identify an efficient task distribution on CPUs+GPUs system with low profiling overhead. HPerf is a hierarchical profiler. First it performs lightweight profiling and then if necessary, it performs detailed profiling to measure caching and data transfer cost. Compared to a brute-force approach, HPerf reduces the profiling overhead significantly and compared to a naive decision, HPerf improves the performance of OpenCL applications up to 25%.
منابع مشابه
TREES: A CPU/GPU Task-Parallel Runtime with Explicit Epoch Synchronization
—We have developed a task-parallel runtime system, called TREES, that is designed for high performance on CPU/GPU platforms. On platforms with multiple CPUs, Cilk's " work-first " principle underlies how task-parallel applications can achieve performance, but work-first is a poor fit for GPUs. We build upon work-first to create the " work-together " principle that addresses the specific strengt...
متن کاملPerformance and Energy Aware Workload Partitioning on Heterogeneous Platforms
Heterogeneous platforms which employ a mix of CPUs and accelerators such as GPUs have been widely used in the high-performance computing area [1]. Such heterogeneous platforms have the potential to offer higher performance at lower energy cost than homogeneous platforms. However, it is rather challenging to actually achieve the high performance and energy efficiency promised by heterogeneous pl...
متن کاملA Parallel Twig Join Algorithm for XML Processing using a GPGPU
With an increasing amount of data and demand for fast query processing, the efficiency of database operations continues to be a challenging task. A common approach is to leverage parallel hardware platforms. With the introduction of general-purpose GPU (Graphics Processing Unit) computing, massively parallel hardware has become available within commodity hardware. XML is based on a tree-structu...
متن کاملEfficient CPU-GPU cooperative computing for solving the subset-sum problem
Heterogeneous CPU-GPU system is a powerful way to accelerate compute-intensive applications, such as the subset-sum problem. Many parallel algorithms for solving the problem have been implemented on graphics processing units (GPUs). However, these GPU implementations may fail to fully utilize all the CPU cores and the GPU resources. When the GPU performs computational task, only one CPU core is...
متن کاملUltra-Fast Image Reconstruction of Tomosynthesis Mammography Using GPU
Digital Breast Tomosynthesis (DBT) is a technology that creates three dimensional (3D) images of breast tissue. Tomosynthesis mammography detects lesions that are not detectable with other imaging systems. If image reconstruction time is in the order of seconds, we can use Tomosynthesis systems to perform Tomosynthesis-guided Interventional procedures. This research has been designed to study u...
متن کامل